Abstract
Context representations are a key element in distributional models of word meaning. A recently proposed approach suggests to represent a context of a target word by a substitute vector. In this work, they propose a variant of substitute vectors, which is suitable for measure context similarity. Then a novel model for representing word meaning in context based on this context representation.
Introduction
A context of a word instance is typically represented by an unordered collection of its first-order neighboring words, called bag-of-words (BOW). In contrast, Yatbaz et al. (2012) proposed to represent this context as a second-order substitute vector. The main contribution of this paper is in proposing a model for word meaning in context, which is based on substitute vector context representations instead of the traditional bag-of-words representations.
Modeling word meaning
Word meaning out-of-context
They define the out-of-context representation
for target word type $u$, as an average of the substitute vectors of its contexts.
$\overrightarrow{p_u} = \frac{1}{|C_u|} \sum_{i \in C_u} \overrightarrow{S_i}$
where $C_u$ is a collection of the contexts observed for target word type $u$ in a learning corpus, and $\overrightarrow{S_i}$ are their substitute vectors.
## Word meaning in context
They would like to alter the out-of-context representation by theoretically averaging only over contexts that induce a word sense similar to that of the given context.
To approximate this objective we use a weighted average of all contexts of $u$, where contexts are weighted according to their similarity to the given context:
$\overrightarrow{p_{u,c}} = \frac{1}{Z} \sum_{i \in C_{cu}} sim(c,i) \cdot \overrightarrow{S_i}$
Compared to the out-of-context
representation, this is sensitive to the context similarity score which will make the representation biased to the given context.
in-context
means the representation will use the context to build the vectors.
Evaluating Context Representations
Taks Description
Given a word windows context $c$ of a target word $u$, we wish to evaluate context similarity measures on their ability to retrieve other contexts of $u$ from $C_u$ that induce a similar sense.
To perform such an evaluation we want a dataset of target words with thousands of sense tagged contrext in $C_u$ for each target word $u$.
Pseudo-word methods
Because there is not large enough corpus, they propose a pseudo-word method which consider a set of real words as pseudo-senses of an artificial pseudo-word.
Pseudo-word methods consider a set of real words as pseudo-senses of an artificial pseudo-word.
pseudo-word
: Sample from the learning corpuspseudo-senses
: For thepseudo-word
, usingWordNet
to identify all of the word’s sysnets. Choose the least polysemous word which occurs at least 1000 times in the corpus as one of the pseudo-sense words.Mixed contexts
: Sample from the corpus 1000 contexts of each pseudo-sense of a pseudo-word.query context
: Sample a single context from its mixed contexts and then ranked the remaining contexts according to each of the compared context similarity mesaures.
- Sample 100 words randomly from our learning corpus,
ukWaC
. - Use
WordNet
to identify all of the word’s synsets. - For each synset they chose the surface word which is the least polysemous yet occurs in our learning corpus at least 1,000 times, as a representative for this synset.
- Then we created a pseudo-word whose pseudo-senses are the set of the representative words.
- They sampled from their learning corpus 1,000 contexts for each pseudo-sense word, and for each pseudo-word they mixed together all contexts of its pseudo-sense words. The original pseudo-sense word for each context was recorded as its sense tag.
- For each pseudo-word, they sampled a single
query context
from all of its mixed contexts and then ranked the remaining contexts according to each of othe compared context similarity measures.